Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition
نویسندگان
چکیده
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous scenarios such as voice typing and meeting transcriptions still suffer from segmentation punctuation problems, resulting irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, constrained by real-time requirements, making it hard to incorporate the right context when decisions. Context within segments produced ASR decoders can be helpful but limiting in overall performance a session. In this paper, we propose streaming approach re-punctuation of output using dynamic decoding windows measure its impact on accuracy across scenarios. The new system tackles over-segmentation issues, improving F0.5-score 13.9%. Streaming achieves an average BLEUscore improvement 0.66 downstream task Machine Translation (MT).
منابع مشابه
Speech recognition with automatic punctuation
We present a method of speech recognition with automatic punctuation based on a combination of acoustic and lexical evidence. In the recognizer vocabulary, punctuation marks are treated as word entries. By assigning the acoustic baseforms of silence, breath, and other non-speech sounds to punctuation marks, and using a properly processed N-gram language model, unpronounced punctuation marks of ...
متن کاملRecovering punctuation marks for automatic speech recognition
This paper shows results of recovering punctuation over speech transcriptions for a Portuguese broadcast news corpus. The approach is based on maximum entropy models and uses word, part-of-speech, time and speaker information. The contribution of each type of feature is analyzed individually. Separate results for each focus condition are given, making it possible to analyze the differences of p...
متن کاملPunctuation in Quoted Speech
Quoted speech is often set off by punctuation marks, in particular quotation marks. Thus, it might seem that the quotation marks would be extremely useful in identifying these structures in texts. Unfortunately, the situation is not quite so clear. In this work, I will argue that quotation marks are not adequate for either identifying or constraining the syntax of quoted speech. More useful inf...
متن کاملLSTM for punctuation restoration in speech transcripts
The output of automatic speech recognition systems is generally an unpunctuated stream of words which is hard to process for both humans and machines. We present a two-stage recurrent neural network based model using long short-term memory units to restore punctuation in speech transcripts. In the first stage, textual features are learned on a large text corpus. The second stage combines textua...
متن کاملBidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration
Automatic speech recognition systems generally produce unpunctuated text which is difficult to read for humans and degrades the performance of many downstream machine processing tasks. This paper introduces a bidirectional recurrent neural network model with attention mechanism for punctuation restoration in unsegmented text. The model can utilize long contexts in both directions and direct att...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International journal on natural language computing
سال: 2022
ISSN: ['2278-1307', '2319-4111']
DOI: https://doi.org/10.5121/ijnlc.2022.11601